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THE GENERALIZED SHRINKAGE ESTIMATOR FOR THE 
ANALYSIS OF FUNCTIONAL CONNECTIVITY OF 
BRAIN SIGNALS 1 

By Mark Fiecas and Hernando Ombao 

Brown University 

We develop a new statistical method for estimating functional 
connectivity between neurophysiological signals represented by a mul- 
tivariate time series. We use partial coherence as the measure of func- 
tional connectivity. Partial coherence identifies the frequency bands 
that drive the direct linear association between any pair of channels. 
To estimate partial coherence, one would first need an estimate of 
the spectral density matrix of the multivariate time series. Paramet- 
ric estimators of the spectral density matrix provide good frequency 
resolution but could be sensitive when the parametric model is mis- 
specified. Smoothing-based nonparametric estimators are robust to 
model misspecification and are consistent but may have poor fre- 
quency resolution. In this work, we develop the generalized shrink- 
age estimator, which is a weighted average of a parametric estimator 
and a nonparametric estimator. The optimal weights are frequency- 
specific and derived under the quadratic risk criterion so that the 
estimator, either the parametric estimator or the nonparametric esti- 
mator, that performs better at a particular frequency receives heavier 
weight. We validate the proposed estimator in a simulation study and 
apply it on electroencephalogram recordings from a visual-motor ex- 
periment. 

1. Introduction. The goal of this paper is to estimate dependence be- 
tween multi-channel electroencephalogram (EEG) signals. In the time do- 
main, partial cross-correlation is used to measure the strength of direct lin- 
ear dependence between a pair of channels. In the frequency domain, partial 
coherence is utilized to identify the frequency bands that drive the direct 
linear association. To estimate partial coherence, one first needs to esti- 
mate the spectral density matrix, which is done via a parametric method 
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(by fitting a parametric model to the EEGs) or by some nonparametric 
procedure such as kernel-smoothing. Both of these procedures have their 
strengths and weaknesses. In this paper we develop a generalized shrinkage 
procedure which is a weighted average of the parametric and nonparamet- 
ric estimates. The frequency-specific weights are derived data-adaptively so 
that the estimator (parametric versus nonparametric) that performs better 
at a particular frequency receives heavier weight. 

To obtain a parametric estimate of the spectral density matrix, one can 
fit a model, such as the vector autoregressive model (VAR), which has been 
applied extensively in the analysis of a variety of brain signals [e.g., Goebel 
et al. (2003); Eichler (2005); Schlbgl and Suppa (2006); Thompson and Siegle 
(2009)]. It is simple and can be easily applied for assessing Granger-causality 
and the direction of information between the signals [Kaminski and Bli- 
nowska (1991); Kaminski et al. (2001)]. Moreover, when the lag order is 
sufficiently large, the VAR estimates are known to be well localized in fre- 
quency. Nonparametric estimators are derived from periodogram matrices 
which are the data analog of the spectral density matrix. One nonparametric 
estimator is obtained by smoothing the periodograms across frequencies. The 
performance of these estimators is a function of the smoothing parameter 
and so these estimators may not always give well-localized estimates. How- 
ever, they are asymptotically mean squared consistent and robust to model 
specification. In this work we develop the generalized shrinkage estimator 
which is a weighted average of the parametric estimator and the nonpara- 
metric estimator and thus provides a good compromise between these two 
estimators. 

The remainder of the paper is organized as follows. In Section 2 we discuss 
partial coherence and the past works on shrinkage estimation. Section 3 lays 
out the framework for the generalized shrinkage estimator. In Section 4 we 
show the performance of the generalized shrinkage estimator relative to the 
VAR estimator, smoothed periodogram and the multitaper on a simulated 
data set. In Section 5 we use the generalized shrinkage estimator to analyze 
functional connectivity on an EEG data set. And finally, in Section 6, we 
summarize the conclusions of this research and briefly discuss properties of 
the generalized shrinkage estimator and future directions. 

2. Background. Coherence, the frequency domain analog of cross-corre- 
lation, is a temporally invariant frequency-specific measure of linear asso- 
ciation between signals. Consider a trivariate time series (X(t),Y(t), Z(t)) T . 
Denote X u (t), Y w (f) and Z u (t) to be the bandpass-filtered signals so that 
each of their spectra is concentrated on some narrow frequency band around co 
Ombao and van Bellegem (2008) showed that coherence has an appealing 
interpretation of being the squared absolute cross-correlation between a pair 
of bandpass filtered signals, so that in this case, the coherence between X(t) 
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and Y(t) at frequency lu can be interpreted as the squared cross-correlation 
between X u (t) and However, if the linear association between X(t) 

and Y(t) at frequency lu is confounded by Z(t), conclusions based on the 
associations between X(t) and Y(t) may be misleading and misinterpreted 
because they may be related only indirectly through Z(t). In other words, 
coherence does not model direct linear association. We can obtain a di- 
rect measure of frequency-specific linear association using partial coher- 
ence, which is interpreted as the squared cross-correlation between X u (t) 
and Y u (t) at frequency lu after removing the temporally invariant linear ef- 
fects of Z(t). To estimate partial coherence, we use a characterization that 
expresses partial coherence as a function of the inverse of the spectral den- 
sity matrix [Dahlhaus (2000)]. In fact, this characterization was used in 
Eichler, Dahlhaus and Sandkuhler (2003) for neural spike trains, Salvador 
et al. (2005) for fMRI, and Medkour, Walden and Burgess (2009) for EEC 

2.1. Characterization of partial coherence. Let ~K(t) = (X±(t), . . . ,Xp(t)) T 
be a P-dimensional weakly stationary zero-mean real-valued discrete time 
series with spectral density matrix {(lu). The diagonal elements of ((lu), de- 
noted f pp (ou), p = 1, . . . , P, are the autospectra of the P channels and each 
of the off-diagonal elements, denoted / M (w),p^ q, is the cross-spectrum be- 
tween channels X p (t) and X q (t). Define the matrix g(w) = f _1 (w) whose 
(p, q)th element is denoted as g pq (uj). Let h(w) be a diagonal matrix whose 

elements are g P p^ 2 (uj). Define the matrix T(lu) to be 
(2.1) T(u) = -h(u)g{u)h(u). 

Then the partial coherence p pq (w) between the p and qth. channels at fre- 
quency lu is the square of the modulus of the (p,q)th element of T(lu), that 
is, p P q(uj) = \T pq (Lu)\ 2 . To estimate partial coherence, we must first estimate 
the spectral density matrix. 

2.2. Related work on shrinkage estimation. Ledoit and Wolf (2004) pro- 
posed shrinkage estimation of the variance-covariance matrix that com- 
bines a "classical estimator" (the sample variance-covariance matrix) with 
a "highly-structured target." Recently, the idea of a convex combination of 
a "classical estimator" with a target has been extended for estimating the 
spectral density matrix of a multivariate time series, which is the frequency- 
domain analog of the variance-covariance matrix. Bdhm and von Sachs 
(2009) developed the shrinkage estimator for the spectral density matrix 
which shrinks the nonparametric estimator to the scaled identity matrix. 
When shrinking the nonparametric estimator toward the scaled identity ma- 
trix the resulting estimate is well conditioned. However, the off-diagonals of 
the estimator is shrunk to and thus potentially biases the estimates of lin- 
ear association toward the null. To overcome this problem, one may shrink 
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the nonparametric estimator toward a more general shrinkage target. For 
factor models in economic time series, Bohm and von Sachs (2008) pro- 
posed to shrink the nonparametric estimator toward a structured model, 
namely, the one-factor model. Here, we extend these works by giving the 
shrinkage weight for any arbitrary shrinkage target. 

3. The generalized shrinkage estimator. Let X n (t), n = 1, 2, . . . , N and 

t = 1,2, . . . ,T, be the nth trial of a P-dimensional weakly stationary zero- 
mean real- valued discrete time series with auto-covariance matrix X(/i), each 
of whose elements is absolutely summable. We shall assume that the trials 
are independent realizations from a common underlying process whose spec- 
tral properties (in particular, partial coherence) we wish to investigate. To 
achieve the goal of estimating partial coherence, we shall first estimate the 
P x P spectral density matrix defined by f(w) = ^ ^ hgZ £(/i) exp(— iwfo). 

Denote the parametric estimator of f(w) to be V(cj) and the nonnpara- 
metric estimator to be f(w). The generalized shrinkage estimator takes the 
weighted average of these two estimators and so takes the form 

(3.1) f*(w) = W T (w)V(u) + (1 - W T (w))f(u). 

Our procedure will use the class of vector autoregressive (VAR) models 
whose order is selected by the Bayesian information criterion (BIC) for 
the parametric estimator and computes the nonparametric estimator with 
smoothing spans selected from a plug-in unbiased risk estimation criterion. 
The weight at a particular frequency is estimated data-adaptively so that 
a heavier weight is given to the estimator that gives a better fit at that fre- 
quency. We first describe the three components of the generalized shrinkage 
estimator before describing the estimation procedure. 

3.1. Component 1: The parametric estimator. 

3.1.1. The vector autoregressive process. Here, we use the class of vector 
autoregressive models for obtaining a parametric estimator for the spectral 
density matrix. A multivariate time series X(i) has a VAR(E") representa- 
tion if X(i) = J2k=i $fc x (* ~k) + Z(t), where the $ fc 's are P x P coefficient 
matrices, Z(i) is white noise with covariance matrix T,z, and other regularity 
conditions are satisfied [Brockwell and Davis (1998)]. The spectral density 
matrix for the VAR(J^) time series takes the form 

V(u)) = — {(Ip - <&i exp(-iwl) ®k exp(-iwK))} _1 x Y> z 

(3.2) ^ 

x {(Ip - $iexp(-iwl) §Kexp{-iojK))*} , 

where Ip is the P x P identity matrix and * denotes the complex conju- 
gate transpose. The order K can be determined using some model selection 
criterion (such as BIC). 



THE GENERALIZED SHRINKAGE ESTIMATOR 



5 



3.1.2. N -trial least squares estimation. We now describe how to obtain 
the least squares estimates of the coefficients of a VAR(K) model for a mul- 
tivariate time series recorded from N trials. The time series for the nth trial 
is modeled as 

K 

(3.3) X n {t) = Y J $kXn(t-k) + Z n (t), 

k=l 

where X n (t) = (X ln (t), . . .,X Pn (t)) T , Z n (t) = (Z ln (t), . . . , Z Pn {t)) T , t = 1, . . . , 
T, and n = 1, . . . , N. We extend the estimation for a single-trial multivariate 
time series in Liitkepohl (1993). Suppose we have K many presample val- 
ues X n (— K + 1), . . . , X n (0) for each trial of this time series. For each trial 
n = 1, . . . N , define 

(3.4) X n = (X n (l),...,X n (T)) (PxT), 

(3.5) B = ($i,..., $k) (PxPK), 

( X n (t) \ 

(3.6) Y n ,i = : (PK x 1), 

W n (t- K + l)J 

(3.7) Y n = (Y n ,o,...,Y n , T -i) (PKxT), 

(3.8) Z n = (Z n (l),...,Z n (T)) (PxT). 

Now let bj be the kth row of B and X n ( fc ) = (Xfc n (l), . . . , Xfc n (T)) T and 
Z„ )(jfc ) = (Z kn (l),...,Z kn (T)) T . With this notation, the nth trial VAR(K) 
model given by equation (3.3) can be written as X n ( fc ) = Yj&& + Z n ( fc ). 
Liitkepohl (1993) gave the solution in case N = 1 and he showed that the 
solution is equivalent to OLS estimation. Now for TV > 1, the setting be- 
comes analogous to that of repeatedly measured multivariate data. From this 
perspective, one can see that the least squares estimator for the VAR(K) 
model with N trials is b k = (J2n=i Y n^I)~ 1 (J2n=i Y n X n,(fc))- Suppose that 
NT > PK. Our estimate of S z = E(Z(i)Z(i) T ) is 

1 N 

(3.9) S Z = — — ^{(X n - BY n )(X n - BY„) T }. 

n=l 

The degrees of freedom adjustment is due to the PK many coefficients in 
each of the P equations in equation (3.3). 

Note that in equation (3.3), there are KP 2 + P(P + l)/2 parameters to 
estimate. However, it is not unusual for the number of components of the 
time series to be large and the number of time points per trial to be small. 
Therefore, it is not efficient to estimate the parameters using only a single 
trial. Here, we have developed a method for estimating the parameters by 
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pooling the data over all of the trials. This is valid if one assumes that 
the data from each trial is a realization of a common underlying VAR(K) 
process. 

The least squares estimation procedure above requires the VAR order K 
to be known. To select the order K in an iV-trial multivariate time series 
framework, we use the information criterion function IC(re) = log |Sx(k)| + 
Pen(T, N, P, n), where is the estimate of T,z obtained after fitting 

a VAR(k) model and Pen(T, N, P, k) is some penalty function for complex- 
ity. Here, we use the penalty function in BIC, which is Pen(T, N, P, k) = 
^^kP 2 . The optimal order K is selected so that K = argmin K IC(/c). 

3.2. Component 2: The nonparametric estimator. 

3.2.1. The smoothed periodogram matrix. Let dx. n (w) = ^= £t=i Xn(^) x 
exp(— iut) be the discrete Fourier transform of X n (t). The nth trial raw pe- 
riodogram is defined to be I n (w) = ^dx, n (w)dx n (w). Kernel smoothing is 
a common method for estimating the spectral density matrix. Let wt{oc) 
be a kernel (weight) function that has smoothing span Mt such that, as 
T — > oo, Mt — y oo but Mt/T —¥ 0. We compute the nth trial estimate of 
the (j,k)th element of f(w) with f n ,jk(u) = / »r(w — a)I n jk(ct) da. Un- 
der regularity conditions [Brillinger (2001)], this estimate is an element-wise 
consistent estimator for f(w). The final estimate for f(w) using all of the 
trials is the average over the trials, namely, f(w) = A/" -1 5^ n=1 f n (w). Sim- 
ilarly, the elements of f(w) are consistent for the elements of f(w). Note 
that in this setup, we can apply minimal smoothing per trial because the 
final nonparametric estimator f(w) undergoes further smoothing due to the 
averaging across replicated trials. 

3.2.2. Automatic selection of the optimal smoothing span. When My is 
too small, the resulting estimate can capture very localized peaks but may 
be too erratic. Conversely, if it is too large, the resulting estimate will be 
very smooth but may miss vital peaks that characterize the process. Optimal 
smoothing spans have been studied for univariate time series. For example, 
the approach by Ombao et al. (2001) used the full likelihood by minimiz- 
ing the gamma deviance. Here, inspired by Lee (1997) and Lee (2001), we 
develop an approach for span selection based on the quadratic risk function. 

Define the integrated risk function for a smoothing span h to be R(h) = 
JqK(\\{(cj) — fh(uj)\\ 2 ) du, where f/i(w) is the periodogram matrix smoothed 
with a kernel having span size h and || • || 2 is the normalized Hilbert-Schmidt 
norm defined by ||^4|| 2 = P" 1 tv(AA*). The goal is to pick Mt so that 
Mt = argmin^ R(h). Such an Mt is the global optimal smoothing span. 
However, we cannot evaluate R[h) because f(w) is unknown. An approach 
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in the univariate time series context is to use a plug-in unbiased risk estima- 
tion (PURE) procedure by plugging in a pilot estimator f p iiot(w) for f(w), 
where the pilot estimator is the periodogram smoothed by a kernel with an 
arbitrarily picked smoothing span. This gives an asymptotically unbiased 
estimate of R(h). 

The proposed procedure for obtaining the optimal smoothing span for 
the nth trial of a multivariate time series is as follows. We combine the 
PURE procedure with a leave-one-out procedure. Let f n h( w ) be the nth 
trial periodogram smoothed with smoothing span h. Let fv_ n )(u;) = (N — 
^j^nhi^)- Since each is an approximately unbiased esti- 

mator of f(w), then fV n )(cj) is an unbiased estimate of f(w) and thus 
will serve as the pilot estimator. Then the nth trial integrated PURE is 
Rn(h) = Jq ||f(_ n )(o;) — f n: h{u)\\ 2 dw. We proceed by picking the smoothing 

span Mj? for the nth trial so that = argmm/j R n {h). 

3.3. Component 3: The theoretical shrinkage weight. Ideally, the frequen- 
cy-specific shrinkage weight corresponding to the parametric estimator, de- 
noted Wt(u), should be greater than 0.5 at frequencies where the parametric 
estimate gives a better fit than the nonparametric estimate. In fact, as we 
will see later, the shrinkage weight is a function of the mean squared error 
of each of the parametric and nonparametric estimators. We formalize this 
in the following discussion. 

First, we introduce the notation t^(oj) = E(fr(w)), where fr(w) is the 
smoothed periodogram. For ease, we drop the subscript T in the nota- 
tion. From Brillinger (2001), f °(cj) - f (w) -> as T -> oo. Here, f will 
serve as the proxy for f(w) and will be utilized in deriving the weights 
for the generalized shrinkage estimator. Define the squared error loss func- 
tion £(f*(w), f °(uj)) = ||f*(w) — f°(w)|| 2 , where the norm is the normal- 
ized Hilbert-Schmidt norm. Then the risk function is lZ(f*(u), f °(w)) = 
E(£(f*(w), f °(u;))). The optimal shrinkage weight Wt(w) is the minimizer 
of the risk function. First, define 

(3.10) a 2 T {oj) = MSE(V(w)) = E(||V(w) - f °(w)|| 2 ), 

(3.11) = MSE(f (w)) = E(||f(w) - f <V)H 2 ) 
and 

(3.12) ^( W ) = E(||V( w )-f( w )|| 2 ). 

Then it can be easily shown the optimal shrinkage weight is 

(3.13) W T ( U ) = ^ • 

This result generalizes that given by both Bohm and von Sachs (2008), 
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whose shrinkage target was that of a one-factor model, and Bohm and von 
Sachs (2009), whose shrinkage target was the scaled identity matrix. 

Remarks. Upon inspection of the optimal shrinkage weight equa- 
tion (3.13), one can obtain insight that the generalized shrinkage estima- 
tor behaves analogous to the Bayes estimators, where the prior estimator 
is given by the parametric estimator and the data-driven estimator is given 
by the nonparametric estimator. Note that the behavior of the shrinkage 
weight is a function of the relative performance of each of the parametric 
and nonparametric estimators. In particular, if the parametric estimator 
models the spectral density matrix well so that a^(u) — > at a rate much 
faster than /3£(w) — > 0, then the shrinkage weight will shift toward the para- 
metric estimator; otherwise, the shrinkage weight will shift toward the non- 
parametric estimator. The second term of the numerator of equation (3.13) 
corrects for the correlation between the parametric estimator and the non- 
parametric estimator. Empirical Bayes estimators use the data to construct 
the prior estimator. Our approach is analagous in the sense that we use the 
same data to construct the parametric and nonparametric estimators, and 
so the second term takes into account that the two estimators are highly 
likely to be correlated. Back to equation (3.13), we see that the denomina- 
tor makes the generalized shrinkage estimator robust to the misspecification 
of the parametric estimator. The denominator is the squared distance be- 
tween the parametric and nonparametric estimators. So if the parametric 
and nonparametric estimators are vastly different from each other, then the 
denominator of the weight will be large, and so the weight will be larger for 
the nonparametric estimator. 

3.4. Constructing the generalized shrinkage estimator. The parametric 
and nonparametric components of the generalized shrinkage estimator can 
be obtained using standard procedures. To estimate the shrinkage weight, 
we need to construct an estimate of each of a^(w) and /3|i(cj), and d^(u). 

Each of a^(uj) and f3^(u) is the expected distance from their respec- 
tive estimator with f °(uj), and so we need to provide an estimate of f (w). 
An asymptotically unbiased estimator of f °(w) is the average of the peri- 

odograms: f°(w) = N _1 J2n=i ^n(w). Consider f3^(ui). Since f(uj) is an unbi- 
ased estimator of f °(o;) by construction, then /3|i(w) is the sum over the vari- 
ances of each of the elements of f(w). Assuming that each element of f (w) 
varies slowly over frequency, our proposed estimator is aj;ype of sample 
variance, that is, we look at the window of size Ct around f(w) for each w: 

(CW)/2 

(3.14) $(u>) = Cf 1 Y, ||fH-f°( W + ^)|| 2 . 

k=-(C T -l)/2 
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Our simulation studies indicate the choice of this procedure is robust over 
a wide choice for Cy. We estimate a^(w) using a type of plug-in estimator 
smoothed over frequencies: 

(C T -l)/2 

(3.15) a 2 T (u:) = C T l Y. ||VH-fV + Wfc )|| 2 . 

fe=-(C T -l)/2 

Our proposed estimator for 5^(oo) is constructed in a similar manner: 

(C T -l)/2 

c? 1 (II% + ^)-vmh 2 



(3.16) 



fc=-(C T -l)/2 

+ ||v( w + Wfc )-?H|| 2 ) 



Then we can obtain a plug-in estimate of the optimal shrinkage weight using 

/3 2 (w) - 0.5(q2,(w) + /3 2 (w) - 



(3.17) Wr(w) 



4(w) 



Note that due to estimation error in obtaining a^(uj), (3^(u)) and S^(u), it is 
possible for our estimate of the shrinkage weight to fall outside the interval 
[0,1]. If this occurs, we truncate the estimated weight to or 1. Finally, 
we plug in the estimated weight to obtain an estimate of the generalized 
shrinkage estimator: 

(3.18) f*(w) = Wt(w)V(w) + (1 - W T (u))f(u). 

4. Simulation study. We now compare the performance of the general- 
ized shrinkage estimator against competitors which are, namely, the VAR 
estimator whose order is selected using the BIC criteria, the smoothed peri- 
odogram and the multitaper [Percival and Walden (1993)]. The VAR estima- 
tor was estimated as described in Section 3.1. The smoothed periodogram, as 
described in Section 3.2, is obtained using the Hann kernel whose smoothing 
span is objectively determined by the quadratic risk criterion described in 
Section 3.2.2. The multitaper estimator was obtained by averaging the per- 
trial multitaper estimates, which is similar to how we obtained the smoothed 
periodogram estimator. The optimal number of tapers was also objectively 
determined using a PURE procedure similar to that described in Section 3.2. 
The estimators are compared using the mean squared error (MSE). 

The true underlying process was a sum of a VAR(5) process and a first- 
order vector moving average (VMA) process. The two processes were gener- 
ated independently so that the underlying spectral density matrix is the sum 



10 



M. FIECAS AND H. OMBAO 



of that of the VAR and the VMA. The vector time series had P = 12 dimen- 
sions, N = 120 trials, and T = 256 time points. The values of the parameters 
are given in the Appendix. 

The results of the simulation study are shown in Figure 1. Under this 
setting, it is obvious that the VAR parametric estimator was incorrectly 
specified. However, as just noted earlier, a VAR can capture the peaks in 
the spectral density matrix. Note that while the VAR estimator is the best 
estimator of the spectral density matrix in the sense that it accurately cap- 
tured the peaks in the autospectra, it was not necessarily the best estimator 
of partial coherence (because the latter is a highly nonlinear function of the 
former). This can be seen in the high frequencies as shown in Figure 1. In 
other words, the best estimator of the spectral density matrix is not necessar- 
ily the best estimator for partial coherence. The nonparametric estimators 
performed poorly relative to the parametric VAR estimator in estimating 
the spectral density matrix because each oversmoothed the peaks in the au- 
tospectra. The generalized shrinkage estimator performed well in estimating 
both the spectral density matrix and partial coherence. It is clear that the 
generalized shrinkage estimator borrowed strength from the parametric VAR 
estimator to better capture the peaks in the autospectra so that it estimated 
the spectral density matrix much better than the nonparametric estimators. 
This can be seen by the shrinkage weight, as shown in Figure 2. The shrink- 
age weight is near 1.0 at frequencies near the location of the peaks of the 
autospectra. 

5. Functional connectivity for EEG. The study of functional connectiv- 
ity of brain signals is the investigation of the dependencies between brain 
signals that have been measured from spatially separated regions of the brain 
[Friston et al. (1993)]. We investigated the dependency structure in the EEG 
signals between certain regions and how it differs across experimental con- 
ditions in a visual-motor task. 

5.1. Data description and preprocessing. The EEG data were recorded 
from the scalp using a 64-channel EEG system (EMS, Biomed, Korneuburg, 
Germany). The electrodes were applied to the scalp using the standard In- 
ternational 10-20 system with a reference electrode on the nose-tip. The 
EEG signals were recorded at 512 samples/second/channel and filtered us- 
ing a high-pass filter of 0.02 Hz and a low-pass filter of 100 Hz. 

Participants of the study were required to make quick displacements of 
a hand-held joystick from a central position either to the right or to the left 
from center as instructed by a visual cue. The visual cue randomly selected 
the movement per trial. From a standard montage of 64 scalp electrodes, our 
neuroscientist collaborator selected a subset of P = 12 channels that were 
presumed, based on published studies, to be recording the relevant neural 
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Fig. 1. Mean squared error estimated via Monte Carlo for the simulation, (a) Frequen- 
cy-specific mean squared error of each estimate of the spectral density matrix, (b) Frequen- 
cy-specific mean squared error of each estimate of the partial coherence matrix. 
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Fig. 2. Shrinkage weight for the generalized shinkage estimator estimated via Monte 
Carlo for the simulation. 

processes for these visual-motor actions [Marconi et al. (2001); Bedard and 
Sanes (2009)]. These electrode sites include the fronto-central leads (FC) to 
measure activity related to premotor processing, the central leads (C) to 
measure activity related to motor performance, and the parietal (P) and 
occipital (O) leads to measure activity related to visual-motor transforma- 
tions. We did not add more electrodes in the analysis because this will 
present unnecessary computing and modeling complications, which we will 
later discuss. The relative locations of these 12 electrode sites are shown in 
Figure 3. 

The methods that we have developed in this work have been for single- 
subject analyses, and so here, we show results for only one subject. We 
analyze the first 0.5 seconds from stimulus onset of the EEG signals, yielding 
a multivariate time series having length T = 256 per trial, and there were 
A^l = 118 and TVr = 138 trials for the leftward and rightward movements 
respectively. For the analysis, we removed the linear and quadratic trends 
from the subject's EEGs. The EEGs were further filtered using a 4th-order 
low-pass Butterworth filter with stopband at 50 Hz and then standardized to 
have unit variance. Figure 4 illustrates time plots of the P = 12 filtered and 
detrended EEG signals obtained from a representative participant during 
leftward and rightward joystick movements. 

5.2. Details on the statistical procedure. Our aim in this work was two- 
fold: first, to estimate the strength of functional connectivity as measured 
by partial coherence, and second, to identify which connections differentiate 
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Fig. 3. Relative placement of the twelve EEG channels preselected for the analysis. These 
channels were presumed to be recording the relevant neural processes for the visual-motor 
task in the experiment. 

between the "left" and the "right" trials. The parametric component was 
computed using a VAR(19) model where the order was selected using the 
BIC criterion and the parameters were estimated using the iV-trials least 
squares procedure. The nonparametric component was obtained using the 

Left Condition 




Time (seconds) 



Fig. 4. Representative 12- channel filtered and detrended EEG recorded from one trial for 
each of the left and right conditions. 
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Fig. 5. Estimated partial coherence for all pairs of the twelve channels in the analysis 
for the "right" condition (upper triangle) and the "left" condition (lower triangle). 



Hann kernel with smoothing span automatically selected by our procedure. 
The mean smoothing span selected for the trials for the "left" conditions 
was 23.37 with standard deviation 7.74 and for the "right" conditions 22.04 
with standard deviation 7.75. The estimated partial coherences are shown in 
Figure 5. To perform a frequency-band analysis of functional connectivity, we 
computed the partial coherences averaged over the frequencies in the band 
of interest. The partial coherences for each of the alpha band (8-12 Hz) and 
beta band (18-30 Hz) for each of the "left" and "right" conditions are shown 
in Figure 6. 

To test for differences in strength of connectivity over a frequency band, 
we normalized the partial coherence estimates using the Fisher's Z-transform. 
We used the jackknife to estimate the standard error of the estimated partial 
coherences over the bands. This was done by, after leaving out one trial, esti- 
mating partial coherence via the generalized shrinkage estimator of the spec- 
tral density matrix and then using Fisher's Z-transform to normalize these 
jackknifed partial coherence estimates. This allowed us to obtain a jackknife 
sample of size JVl estimates of partial coherence for each of the frequency 
bands for the "left" condition and a jackknife sample of size estimates 
of partial coherence for each of the frequency bands for the "right" condi- 
tion. The point estimates for each of the conditions as well as their standard 
errors, both estimated using a sample size or iVpt, allowed us to create 
t-statistics to test for differences via a two-sample i-test across conditions. 
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Fig. 6. Relative strength of functional connectivity measured by partial coherence at the 
alpha and beta frequency bands for the "right" condition (upper triangle) and "left" con- 
dition (lower triangle). 



These i-statistics are shown in Figure 7. Note that for each frequency band, 
we performed 12 • (12 — l)/2 = 66 tests, so that in all we are performing 132 
tests. To correct for multiple comparisons, we performed our tests control- 
ling for FDR at 0.05. The null hypotheses of no difference across conditions 
that were rejected are marked with an asterisk in Figure 7. 

5.3. Results. On the estimates of the autospectra. The estimated power 
spectral densities for each channel are shown in Figure 8(a) and (b), and the 
estimated shrinkage weight is shown in Figure 9. For the "left" condition, 
the VAR estimator picks up a peak in the autospectra at the very low 
frequencies, and then a smaller peak around 16 Hz. At these frequencies, 
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Fig. 7. The absolute value of the t-statistics for testing for differences between "left" and 
"right" conditions. The t-statistics marked with an asterisk (*) were declared significant 
after adjusting their p -values at FDR level of 0.05. 

the generalized shrinkage estimator disagreed with the VAR estimator, and, 
in fact, the shrinkage weight was truncated to 0.0 at these frequencies. For 
the "right" condition, again the VAR estimator picks up two peaks, one 
around 6 Hz and the other around 24 Hz, and the smoothed periodogram 
oversmoothed these peaks. The estimate of the shrinkage weight was not 
truncated at 0.0 at 4 Hz, and so the estimates given by the generalized 
shrinkage estimator are showing a slight peak. The shrinkage weight was 
much less than 0.5, and so the smoothed periodogram was favored. This 
implies that the VAR model may not be an adequate model. 

On the estimates of partial coherence. Figure 10 shows the estimated 
coherence for all pairs. Recall that coherence is the frequency domain analog 
of squared cross-correlation. Coherence among the fronto-central leads and 
among the occipital leads are strong, in fact, near 1.0 at some frequencies. 
The coherence between the occipital leads and the fronto-central leads is 
smaller. Since coherence captures both direct and indirect connectivity, then 
we can see that the occipital leads are somehow connected with the fronto- 
central leads. Partial coherence captures only the direct connectivity by 
removing the effects of the other leads in the analysis. In Figure 5 we see 
that the strong direct connections are among the fronto-central leads and the 
occipital leads. Moreover, the connections between the fronto-central leads 
and the occipital leads are much weaker after partialization, suggesting that 
the connection between these regions is more likely to be indirect. 

On comparing connectivity between left vs right conditions. When testing 
for differences of the "left" and "right" connections as shown in Figure 7, 
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Fig. 8. Estimated autospectra for each of the two conditions. Shown here are the three 
estimators: the VAR(19) (blue), the smoothed periodogram (black), and the generalized 
shrinkage (red), (a) Estimated autospectra for the "left" condition, (b) Estimated autospec- 
tra for the "right" condition. 




some of the differences in strength of connections were deemed statistically 
significant. However, upon closer inspection, many of these differences may 
be considered to be irrelevant because the estimated strength of connec- 
tion is very weak. For instance, the difference in strength in the 01-FC4 
connection in the beta band had the largest i-statistic and was considered 
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Fig. 10. Estimated coherence for all pairs of the twelve channels in the analysis for the 
"right" condition (upper triangle) and the "left" condition (lower triangle). 
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statistically significant, and yet, the estimated partial coherence values for 
this connection are 0.0225 and 0.0116 for the "left" and "right" conditions, 
respectively. Though this is nearly a 2-fold increase in strength of connection, 
these squared correlation values are small and so the connection may not be 
relevant to the visual-motor task. Among those pairs where the estimated 
partial coherence is larger than 0.05 for at least one of the conditions, differ- 
ences between conditions were significant for the following pairs: C3-FC5, 
P3-C3 and 02-OZ in the alpha band and FC3-FC5, FC3-FC4, C3-FC3 
and P3-C3 in the beta band. 

Of these differences, the only connection where it is stronger in the "left" 
condition is the P3-C3 connection; for the other differences, the "right" con- 
dition yielded stronger connections. An analysis by Bdhm et al. (2010) con- 
cluded that the coherence between C3 and FC3 was the most discriminating 
feature between the two conditions using frequency-domain characteristics 
of the EEG signals. Our jackknife procedure is consistent with this finding 
and even provides additional information, namely, that a measure of the 
direct connection between C3 and FC3 may be used to distinguish between 
"left" and "right" conditions. 

On the limitations of EEG. While EEGs have excellent temporal resolu- 
tion, they are not highly localized in space. Nevertheless, they remain highly 
utilized in many studies because they have excellent temporal resolution and 
thus can be used to probe into brain processes that occur at the millisec- 
ond level. Moreover, they are relatively inexpensive to collect, noninvasive 
and highly portable, and thus have the high potential for brain-computer 
interface applications. 

On the limitations of partial coherence. Partial coherence is a tool for 
investigating second-order dependencies for a given set of channels. Partial 
coherence measures only the strength of direct dependencies and does not 
give information of the direction of the dependencies, which can be captured 
using other metrics of association [e.g., Kaminski et al. (2001)]. Moreover, 
certain regions may be strongly connected but in a nonlinear manner, and 
if this is the case, we would have missed this in the present analysis because 
partial coherence measures only linear associations between the signals and 
the spectral density matrix captures only second-order dependencies. One 
can investigate nonlinear and higher order dependencies in multivariate time 
series using metrics such as mutual information. 

On the importance of the selection of the channels. Since partial coherence 
measures direct dependencies, the choice of the channels to include in the 
analysis is important. We illustrate this by showing how the estimates of the 
partial coherence are affected when signals from more and less channels are 
included in the analysis. 

We first see the effects if channels were removed. Figure 11 shows the 
estimates of partial coherence for P = 8. Compare the estimates of partial 
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Fig. 11. Relative strength of functional connectivity measured by partial coherence at 
the alpha and beta frequency bands for the "right" condition (upper triangle) and "left" 
condition (lower triangle) when P = 8 channels are analyzed. 

coherence between FC3 and FC4 in the alpha band as shown in Figure 11 
with that shown in Figure 6. The partial coherence is larger for the case P = 
8 because the signals from the FC5 and FC6 channels were excluded. Then 
in the 12-channel analysis, when computing partial coherence between FC3 
and FC4, the effects of FC5 were removed from FC4, and since FC5 is highly 
conditionally dependent with FC3 (as shown in Figure 6), this dampened the 
partial coherence between FC3 and FC4. In fact, partial coherence estimates 
are larger for the P = 8 analysis, and this is due to the exclusion of certain 
channels. 

Consider now the situation when more channels are included. Figure 12 
shows the estimates of partial coherence for P = 20. Here, partial coherence 
estimates are smaller due to the inclusion of certain channels. Let us look at 
the partial coherence between P3 and C3 in the alpha band. Now for P = 20, 
partial coherence between P3 and C3 removes the effects of each of C5 and 
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Fig. 12. Relative strength of functional connectivity measured by partial coherence at 
the alpha and beta frequency bands for the "right" condition (upper triangle) and "left" 
condition (lower triangle) when P = 20 channels are analyzed. 

P5. By removing the effects of P5 from C3, one is also removing the effects 
of P3 because P3 and P5 are highly conditionally dependent, and, similarly, 
by removing the effects of C5 from P3, one is removing the effects of C3 
because C3 and C5 are highly conditionally dependent. Thus, the partial 
coherence between P3 and C3 is lower when C5 and P5 are included in the 
analysis. 

This motivates the importance of the choice of channels in the analysis. 
As we have just shown, including or excluding channels can have an effect 
on the analysis. However, one should not include everything in the analysis 
for two reasons. First, there is the problem of high-dimensionality, which can 
lead to problems in inference because there are more connectivity measures 
to estimate. Second, we note that many of these (neighboring) channels are 
highly correlated with each other, which then introduces redundant infor- 
mation. So if all channels were included in the analysis, we expect that many 
of the partial coherence estimates will be dampened down to 0. 

6. Discussion. Our main contribution for spectral analysis in multivari- 
ate time series is a new estimator for estimating the spectral density matrix. 
Our approach simply takes the weighted average of two known estimators, 
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namely, a parametric and nonparametric estimator. We derived the weights 
using a multivariate mean squared error criterion. We have made further de- 
velopments in shrinkage estimation for the spectral density matrix by giving 
freedom in the choice of the shrinkage target and by giving the appropriate 
shrinkage weight for this choice of the shrinkage target. Our shrinkage tar- 
get in this work is the spectral density matrix for a VAR model. We have 
proposed a method to take advantage of the trials of an experiment for es- 
timating the parameters of the VAR model. Our nonparametric estimator 
is the classical smoothed periodogram. We have proposed a method to take 
advantage of the trials of an experiment to select the optimal smoothing 
span of the smoothing kernel. We then outlined a simple method for esti- 
mating the optimal shrinkage weight to construct the generalized shrinkage 
estimator and then evaluated its performance on simulated data sets before 
using it to analyze functional connectivity in an EEG data set. 

The performance of the generalized shrinkage estimator is a function of 
the performance of each of the parametric and nonparametric components. 
In fact, it can be shown that the risk for the generalized shrinkage estimator 
is a weighted average of the risk of each of the two components less a correc- 
tion term for the distance between the two components. If the true spectral 
density matrix can be well approximated by that given by a VAR model, 
then an estimator based on the VAR model alone will outperform the gen- 
eralized shrinkage estimator. However, one can never truly know how well 
the VAR model approximates the true spectral density matrix. Our gener- 
alized shrinkage estimator first fits the VAR model, and then adjusts this 
fit with the nonparametric smoothed periodogram. The optimal shrinkage 
weight is picked by minimizing the risk over all P dimensions of the multi- 
variate time series simultaneously. This can be problematic if there is a wide 
range of dynamics across the dimensions of the multivariate time series. For 
instance, if the autospectra for all but one dimension were flat and there is 
a sharp peak in that one dimension, then though the VAR estimator will 
capture that peak and the smoothed periodogram oversmooths that peak, 
the generalized shrinkage estimator will not give the VAR estimator a lot of 
weight just to capture that one peak. 

There is still a great amount of work to be done with the generalized 
shrinkage estimator. It remains to show the large-sample behavior of the 
generalized shrinkage estimator for estimating the spectral density matrix. 
Large-sample results were given for the shrinkage estimators described by 
Bohm and von Sachs (2008) and Bohm and von Sachs (2009). However, 
their work constrained the class of shrinkage targets; Bohm and von Sachs 
(2009) used the scaled identity matrix as the shrinkage target as a way of 
regularizing the smoothed periodogram and the shrinkage target used by 
Bohm and von Sachs (2008) was specifically the one-factor model. In both 
works they were able to show consistency of their shrinkage estimator. In 
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this work, though we used the VAR model as the shrinkage target, when 
we developed the generalized shrinkage estimator we have refrained from 
imposing conditions on the shrinkage target, and, in fact, our results on 
the optimal shrinkage weight remains valid for any shrinkage target. Recall 
that the number of parameters in a VAR( K) model is of the order KP 2 . It 
may be the case that imposing more constraints to decrease the parameter 
space of the VAR model or considering other shrinkage targets with a low- 
dimensional parameter space will improve the performance of the generalized 
shrinkage estimator. In the future, we would like to investigate the large- 
sample performance of the generalized shrinkage estimator when constraints 
are imposed on the shrinkage target. 

We do not have asymptotic distributions for the estimated partial coher- 
ence in a frequency band via generalized shrinkage. Test statistics in the 
literature have been for partial noncorrelation between two signals so that 
the test for zero partial coherence is for all frequencies. Parametric tests for 
this null hypothesis have been provided by, for example, Dahlhaus, Eichler 
and Sandkiihler (1997) and Dahlhaus (2000). Nonparametric tests, on the 
other hand, are difficult to construct; to create a boostrap distribution, for 
instance, one would have to somehow preserve the correlation structure that 
exists in the other frequency bands that are not of interest. One approach 
is to shuffle the data across trials in order to completely destroy the corre- 
lation structure, but tests on partial coherence over a frequency band using 
this approach will have a larger Type I error than advertised. However, one 
can take advantage of the multiple trials in the experiment and the multiple 
experimental conditions to investigate the differences in connectivity across 
the experimental conditions using nonparametric tests, as we have done here 
using the jackknife. 



The coefficient matrix for the VMA process is as follows. First, let 
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Then the coefficient matrix is 




The coefficient matrices for the VAR process are as follows. Let 

<j) X = (0.75,0.75,0.75). 
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Then the two coefficient matrices are 

*i =diag(>i, (/>!, <f>i,<f>i), * 2 = -0.20-Ii2, $ 3 = 0i 2 , 

$ 4 = -0.15-Ii2 and * 5 = -0.05 • I 12 . 

The noise process Z(i) is a zero-mean 12-dimensional Gaussian process with 
variance-covariance matrix I12. A realization X(i) of the mixture process 
takes the form X(t) = 0.65-X M A(i) + 0.35-X A R(t), where X MA (i) is a VMA 
and Xar(£) is a VAR, and the two are independent. Because the two pro- 
cesses are independent, then the spectral density matrix of the mixture pro- 
cess is the weighted sum of the spectral density matrix of the VMA process 
and the spectral density matrix of the VAR process. 
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